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A central question in our understanding of the physical world is how our knowledge of the whole 
relates to our knowledge of the individual parts. One aspect of this question is the following: to 
what extent does ignorance about a whole preclude knowledge of at least one of its parts? Relying 
purely on classical intuition, one would certainly be inclined to conjecture that a strong ignorance 
of the whole cannot come without significant ignorance of at least one of its parts. Indeed, we 
show that this reasoning holds in any non-contextual hidden variable model (NC-HV). Curiously, 
however, such a conjecture is false in quantum theory: we provide an explicit example where a large 
ignorance about the whole can coexist with an almost perfect knowledge of each of its parts. More 
specifically, we provide a simple information-theoretic inequality satisfied in any NC-HV, but which 
can be arbitrarily violated by quantum mechanics. Our inequality has interesting implications for 
quantum cryptography. 



In this note we examine the following seemingly inno- 
cent question: does one's ignorance about the whole nec- 
essarily imply ignorance about at least one of its parts? 
Given just a moments thought, the initial reaction is gen- 
erally to give a positive answer. Surely, if one cannot 
know the whole, then one should be able to point to an 
unknown part. Classically, and more generally for any 
deterministic non-contextual hidden variable model, our 
intuition turns out to be correct: ignorance about the 
whole does indeed imply the existence of a specific part 
which is unknown, so that one can point to the source of 
one's ignorance. However, we will show that in a quan- 
tum world this intuition is flawed. 

THE PROBLEM 

Let us first explain our problem more formally. Con- 
sider two dits yo and y\ € {0, . . . , d— 1}, where the string 
V = yo 2/i plays the r °l e of the whole, and yo, yi are the 
individual parts. Let p y denote an encoding of the string 
y into a classical or quantum state. In quantum theory, 
p y is simply a density operator, and in a NC-HV model 
it is a preparation V y described by a probability distribu- 
tion over hidden variables A € A. Let Py be a probability 
distribution over {0, . . . , d — l} 2 , and imagine that with 
probability Py{y) we are given the state p y . The op- 
timum probability of guessing y given its encoding p y , 
which lies in a register E, can be written as 

Pg^fXlE) = max Yl Pr{v)v{y\M,V y ) , (1) 

y£{Q....,d-l} 2 

where p{y\M, V v ) is the probability of obtaining out- 
come y when measuring the preparation V y with M., 
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and the maximization is taken over all <i 2 -outcome mea- 
surements allowed in the theory. In the case of quan- 
tum theory, for example, the maximization is taken over 
POVMs M = {M y } y and p(y\M,V y ) = tr(M yPy ). The 
guessing probability is directly related to the conditional 
min-entropy H OQ (Y\E) through the equation [2 

H DO (y|E) := -log P sness (Y\E) . (2) 

This measure plays an important role in quantum cryp- 
tography and is the relevant measure of information in 
the single shot setting corresponding to our everyday ex- 
perience, as opposed to the asymptotic setting captured 
by the von Neumann entropy. A closely related variant is 
the smooth min-entropy HJ C (K|_E) which can be thought 
of as being like H oc (Y'|i?) except with some small error 
probability e. The main question we are interested in can 
then be loosely phrased as: 

How does Hoo(y = (ignorance about 

the whole) relate to H 00 (F C |EC), for C € 
{0, 1} (ignorance about the parts)? 

Here the introduction of the additional random vari- 
able C is crucial, and it can be understood as a pointer to 
the part of Y about which there is large ignorance (given 
a large ignorance of the whole string Y); see Figure [I] 
for an illustration of this role. It is important to note 
that the choice of C should be consistent with the en- 
coding prior to its definition. That is, whereas C may of 
course depend on Yq, Yi and the encoding E, the reduced 
state on registers holding Yq,Y\ and E after tracing out 
C should remain the same. In particular, this condi- 
tion states that C cannot be the result of a measurement 
causing disturbance to the encoding register; if we were 
allowed to destroy information in the encoding we would 
effectively alter the original situation. 
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RESULTS 

An inequality valid in any NC-HV model. 

We first show that classically, or more generally in 
any non-contextual hidden variable model 22 , ignorance 
about the whole really does imply ignorance about a part. 
More specifically, we show that for any random variable 
Y = YqYi and side information E, there exists a random 
variable C £ {0, 1} such that 



Roo(Y c \EC) 



> 



Hoo(y yi|£;) 



(3) 



This inequality can be understood as an information- 
theoretic analogue of Bell inequalities to the question of 
non-contextuality. Classically, this inequality is known 
as the min- entropy splitting inequality, and plays an im- 
portant role in the proof of security of some (classical) 
cryptographic primitives [3J H]. The proof of (|3) is a 
straightforward extension to the case of standard NC- 
HV models |8] of a classical technique known as min- 
entropy splitting first introduced by Wullschleger [3J , and 
we defer details to the appendix. 

The fact that C is a random variable, rather than be- 
ing deterministically chosen, is important, and an ex- 
ample will help clarify its role. Consider Y uniformly 
distributed over {0, . . . , d - l} 2 and E = Y with prob- 
ability 1/2, and Y\ with probability 1/2. In this case it 
is easy to see that both Yq and Y\ can be guessed from 
E with average success probability 1/2 + l/(2d), so that 
H oo (F |-E') = Hoo(Yi|.E) ~ 1, which is much less than 
K oa (Y\E) fa logd. However, define C as if E = Y\ and 
1 if E = Y . Then it is clear that H 00 (Y C |£ , C') = logd, as 
we are always asked to predict the variable about which 
we have no side information at all! In this case the ran- 
dom variable C "points to the unknown" by being cor- 
related with the side information E, but is entirely con- 
sistent with our knowledge about the world: by tracing 
out C we recover the initial joint distribution on (Y, E). 
This also highlights the important difference between the 
task we are considering and the well-studied random ac- 
cess codes [5] , in which the requirement is to be able 
to predict one of Yq, Y% (adversarially chosen) from their 
encoding; for this task it has been demonstrated that 
there is virtually no asymptotic difference between clas- 
sical and quantum encodings (see below for a discussion) . 

It is interesting to note that Q still holds if we con- 
sider a somewhat "helpful" physical model in which in 
addition to the encoding one might learn a small number 
of "leaked" bits of information about Y. More specifi- 
cally, if the NC-HV discloses m extra bits of information 
then it follows from the chain rule for the min-entropy 
(see appendix) that 



Hoo(Yc|£C) 



> H oo (V V 1 |g) 



(4) 



Violation in quantum theory. Our main result 
shows that (|3| is violated in the strongest possible sense 



by quantum theory. More specifically, we provide an ex- 
plicit construction that demonstrates this violation: Let 
Y = Y Yi be uniformly distributed over {0, . . . , d — l} 2 . 
Given y = j/oj/i £ {0, • • • , d — l} 2 , define its encoding 



yovi 



as 



(5) 



where Xd and are the generalized Pauli matrices and 
1 



m := 



2 1 



:(|0)+F|0» 



(6) 



with F being the matrix of the Fourier transform over 
Z<j. Since we are only interested in showing a quantum 
violation, we will for simplicity always assume that d is 
prime [53] . The system YE is then described by the ccq- 
state 



PY Y 1 E 



-y 
d 2 ^ 



2/o)<2/o| ® \yi){Vi\ ® Py oVl 



(7) 



We first prove that H 00 (y|S) = log d for our choice of 
encoding. We then show the striking fact that, even 
though the encoding we defined gives very little infor- 
mation about the whole string Y, for any adversarially 
chosen random variable C (possibly correlated with our 
encoding) one can guess Yq from its encoding pe with 
essentially constant probability. More precisely, for any 
ccqc-state py y 1 ec, with C £ {0,1}, that satisfies the 
consistency relation txc{pY Y-i_Ec) — Py y 1 e, we have 



Hoo (Y c | £C) 



1 



(8) 



for any sufficiently large d. This shows that the inequal- 
ity ([3]) can be violated arbitrarily (with d) , giving a strik- 
ing example of the malleability of quantum information. 
What's more, it is not hard to show that this effect still 
holds even for H^, for constant error e, and a "helpful" 
physical model leaking m sa clogd bits of information 
with c < 1/2. Hence, the violation of the inequality ^ 
has the appealing feature of being very robust. Indeed, 
for any number of bits m a NC-HV might leak in addi- 
tion, we could find a d to ensure a violation. 



PROOF OF THE QUANTUM VIOLATION 

We now provide an outline of the proof that the en- 
coding specified in ([5]) leads to a quantum violation of 
the splitting inequality ([3]); for completeness, we provide 
a more detailed derivation in the appendix. Our proof 
proceeds in three steps: first, by computing R 00 (Y\E) 
we show that the encoding does indeed not reveal much 
information about the whole. Second, we compute the 
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FIG. 1: Intuitively, one can also understand our result in terms of a 
game between Bob and a malicious challenger, the Owl. Imagine Bob 
is taking a philosophy class teaching him knowledge about Y", clearly 
chosen uniformly at random. Unfortunately, he never actually attended 
and had insufficient time to prepare for his exam. Luckily, however, he 
has been given some encoding E of the possible answers YqY\ : hastily 
prepared by his old friend Alice. When entering the room, he had to 
submit E for inspection to the challenger who knows Yq, Y\ as well 
as the encoding Alice might use. After inspection, the challenger may 
secretly keep a system C, possibly correlated with E, but such that 
the reduced system on Yq, Y\ and E looks untampcred with. It is 
immediately obvious to the challenger that Bob must be ignorant about 
the whole of YqY\. But can it always measure and point to a C — c 
such that Bob is ignorant about Yc? That is, can it always detect Bob's 
ignorance by challenging him to output a single Yc? Classically, this is 
indeed possible: ignorance about the whole of YqYi implies significant 
ignorance about one of the parts, Yc ■ However, a quantum Bob could 
beat the Owl. 

optimal measurements for extracting Yq and Y\ on aver- 
age, and show that these measurements perform equally 
well for any other prior distribution on Y. Finally, we 
show that even introducing an additional system C does 
not change one's ability to extract Yc from the encoding. 

Step 1: Very intuitively, ignorance about the whole 
string already follows from Holevo's theorem and the fact 
that we are trying to encode 2 dits into a d-dimensional 
quantum system. To see this more explicitly, recall 
that Hoof^i?) = logd is equivalent to showing that 
Pgucss(Y\E) = 1/d. From ([!]) we have that this guess- 
ing probability is given by the solution to the following 
semidefinite program (SDP) 

maximize ^ T, Vo<yi tr ( M y yi l*yoi/i) (*j/oj/i I) 
subject to M yoyi > for all yo,yi , 

^vom M vo,yi = 1 ■ 
The dual SDP is easily found to be 
minimize Tr(Q) 

subject to Q > d \\^y y 1 )(^y oyi \ for all y ,yi . 

Let Uprimai and i>d U ai be the optimal values of the pri- 
mal and dual respectively. By the property of weak 
duality, Wduai > ^primal always holds. Hence, to prove 
our result, we only need to find a primal and dual so- 
lutions for which w pr i ma i = Wduai = 1/d. It is easy 
to check that Q = I/d 2 is a dual solution with value 



«duai = tr(Q) = 1/d. Similarly, consider the measure- 
ment M yoyi = \^y Q y 1 )(^y oyi \/d. Using Schur's lemma, 
one can directly verify that J2 Vo , yi M y yi = ^ giving 
^primal = 1 / d. The claimed value of the conditional min- 
entropy follows. 

Step 2: A similar argument, exploiting the symmetries 
in the encoding, can be used to show that 

P S ucss(Y \E) = P gucss (Yi \E) = i + ^= . (9) 

The measurements that attain these values are given by 
the eigenbases of Zd and Xd respectively. 

As a remark to quantum information theorists, note 
that this means that our encoding doubles as a random 
access encoding of the string y into a d-dimensional quan- 
tum state p y with probability ([9| to recover yo or y\ . For 
d = 2, such encodings have previously been considered 
in the realm of contextuality as a reinterpretation of the 
CHSH inequality [HHO]. However, we note that this is 
not what is surprising here, as there exists an obvious 
classical random access encoding for 2 dits into a single 
dit (see discussion on C above) , with recovery probability 
1/2 + 1/(2g0. 

Simply computing Q is hence insufficient for our pur- 
poses. Let us write {[yo), yo G {0, ...,<£ — 1}} for the 
eigenbasis of Zd, and note that its Fourier transform 
{F\yi), yi G {0, . . . ,d — 1}} is then the eigenbasis of Xd- 
Exploiting the symmetries in our problem, it is straight- 
forward to verify that for all yo, y± € {0, . . . , d — 1} 

Kyol^r)! 2 = \(yi\^\* yoyi )\ 2 = \ + ^= . (io) 

An important consequence of this is that for any other 
prior distribution P yoyi , measurement in the Zd eigenba- 
sis distinguishes the states 

a vo =J2 P yoyAyo,yi)\^yoyi)(^y yi\ , (U) 
yi 

with probability at least 1/2 + l/(2y/d), even when the 
distribution is unknown. A similar argument can be 
made for the marginal states a yi and measurement in 
the Xd eigenbasis. 

Step 3: It now remains to show that, for any possi- 
ble choice of an additional classical system C [23], one 
can still guess Yc from the encoding with a good suc- 
cess probability: one cannot construct a C which would 
"point to the unknown". Note that we may express the 
joint state with any other system C as 

py„y iEC = \vo)(yo\ ® \m){vi\ ® py c yic - (12) 

for some states Py^ VlC on registers E and C. Since the 
reduced state on Yq,Yi and E should be the same for any 
C we have by the fact that Y and Y\ are classical that 

tr c{Py ( ? yi c) = 1*1/0*1.) (*1/0V1 1- SinCe l*V0l/lX*tt>i/ll is a 
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pure state, this implies that Py^ VlC = |*y »i)\* r iroi/il ® 
^yoj/i • Now imagine that we were to perform some arbi- 
trary measurement on C, whose outcome would suppos- 
edly point to an unknown substring. But this merely cre- 
ates a different distribution P yoVl over encoded strings, 
and we already know from the above that we can still 
succeed in retrieving either yo or y\ with probability at 
least 1/2 + 1/ (2yd) by making a measurement in the 
or Zd basis respectively. Hence for large d we have a 
recovery probability of roughly 1/2, implying 

r 00 (y \ec = o)^r 00 (y 1 \ec = i)^i, (13) 

which is our main claim. 

Note that the consistency condition, which states that 
our choice of C should be compatible with the original 
situation and should not affect the reduced state, is im- 
portant, and makes our task non-trivial. As an example, 
consider our construction for d = 2. In that case the 
encoding states lie in the XZ-plane of the Bloch sphere. 
Imagine now that we measured the encoding register E 
in the eigenbasis of a y , and let the outcome be C. But 
for any measurement in the eigenbasis of o~ y we observe 
entirely random outcomes, and the post-measurement 
states trivially no longer carry any information about the 
encoded string. Indeed, any choice of C would do if we 
are allowed to destroy information in such a manner. 



IMPLICATIONS FOR CRYPTOGRAPHY 

Our result answers an interesting open question 
in quantum cryptography |llj . namely whether min- 
entropy splitting can still be performed when conditioned 
on quantum instead of classical knowledge. This tech- 
nique was used to deal with classical side information 
£ in |H |T2]. Our example shows that quantum min- 
entropy splitting is impossible, even when we would be 
willing to accept subtracting a large error term on the 
r.h.s. of p]). This tells us that classical protocols that 
rely on such statements may become insecure in the pres- 
ence of quantum side information, and highlights the im- 
portance of so-called min-entropy sampling results of [13] 
used in quantum cryptography [14] instead. It also in- 
dicates that contextuality may play a more important 
role in our understanding of the possibilities and limits 
of quantum cryptography than previously thought. 



DISCUSSION 

The first indication that something may be amiss when 
looking at knowledge from a quantum perspective was 
given by Schrddinger [T5], who pointed out that one can 
have knowledge (not ignorance) about the whole, while 
still being ignorant about the parts [25] . Here, we tackled 
this problem from a very different direction, starting with 
the premise that one has ignorance about the whole. 



Our results show that contextuality is responsible for 
much more significant effects than have previously been 
noted. In particular, it leads to arbitrarily large quan- 
tum violations of ([3| , which can be understood as a Bell- 
type inequality for non-contextuality. This is still true 
even for a somewhat "helpful" physical model, leaking 
additional bits of information. To our knowledge, this is 
the first information-theoretic inequality distinguishing 
NC-HV models from quantum theory. Our question and 
perspective are completely novel, and we hope that our 
observations will lead to an increased understanding of 
the role of contextuality. In this work, we have considered 
standard NC-HVs in which all HVs can be decomposed 
as convex combinations of extremal HVs which give de- 
terministic outcomes for effects (see appendix). It is an 
interesting open question whether our results can be gen- 
eralized to very general models that distinguish between 
measurement and preparation contextuality pQ. 

At the heart of our result lies the fact that contextual- 
ity allows for strong forms of complementarity in quan- 
tum mechanics (often conflated with uncertainty |16j). 
which intuitively is responsible for allowing the violation 
of ([3]). Typically, complementarity is discussed by con- 
sidering examples of properties of a physical system that 
one may be able to determine individually, but which 
cannot all be learned at once. In spirit, this is simi- 
lar to the notion of a random access encoding where we 
could determine either property Y or Y\ quite well, but 
not all of Y. However, as discussed above this can also 
be true classically, in a probabilistic sense. We would 
thus like to emphasize the novelty of our perspective, as 
we approach the problem from the other end, and first 
demonstrate the general result that in an NC-HV igno- 
rance about the whole always implies ignorance about a 
part. We then show that in a quantum world, this prin- 
ciple is violated in the strongest possible sense, even with 
respect to an additional system C. One could think of 
this as a much more robust way of capturing the intuitive 
notion of complementarity [17] . 

Finally, it is an interesting open question whether our 
inequality can be experimentally verified. Note that this 
made difficult by the fact that our aim would be to test 
ignorance rather than knowledge. However, it is con- 
ceivable that such an experiment can be performed by 
building a larger cryptographic protocol whose security 
relies on being ignorant about one of the parts of a string 
Y created during that protocol [25]. A quantum viola- 
tion could then be observed by breaking the security of 
the protocol, and exhibiting knowledge (rather than igno- 
rance) about some information that could not have been 
obtained if the protocol was secure. 
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In this appendix, we provide a detailed derivation of our 
results. To this end, we first provide some more detailed 
background on the entropic quantities we use in Sec- 
tion lA) In Section [5] we show that the splitting inequal- 
ity ( B8 ) is satisfied in any deterministic non-contextual 
hidden variable model (NC-HV model for short). This 
is a minor twist on the existing classical proof [3] due 
to Wullschleger 3J. Finally, in Section [C] we proceed to 
prove our main result, that there exists a quantum encod- 
ing which strongly violates the splitting inequality ( B8 1 . 



Appendix A: Entropy measures 

Throughout, we will measure information in terms of 
the min-entropy, which is directly related to the guessing 
probability P guess (Y\E) [2], where Y is a classical string 
ranging in the set y and E an auxiliary system. It is 
defined as the maximum probability with which one can 
predict the whole string Y, given the system E. The 
maximization is over all possible observations, or mea- 
surements, on E; these vary depending on the physical 
model (e.g. classical or quantum) under consideration. 

Definition A.l. Let Y be a classical random vari- 
able with distribution Py taking values in a set y , and 
{Vy} v< =y any set of preparations on E. Then the maxi- 
mum guessing probability of Y given E is defined as 



P, 



{My} 



yey 



PY(y)p(y\Vy,M y ) , (Al) 



where the maximum is taken over all measurements Ai 
{My} allowed in the model. 
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For instance, in the case of quantum mechanics we 
simply have 



Per, 



3 (F|£):= max £ JV(|/) tr (M„ 

VyM y >0 y 



P E y) 



(A2) 



where the maximization is taken over all POVMs [35] and 
Py denotes the reduced state of the system on E, when 
Y = y. For classical side-information E this expression 
simplifies to 



*(Y\E) 



E P 



max P Y \E= e (y) 

V 



(A3) 



In other words, for classical side information, the optimal 
guessing measurement is to simply output the y which is 
most likely given the classical value e. 

For the case of classical and quantum theories it is 
known that the guessing probability directly relates to 
the conditional min-entropy [2] . Here, we follow the oper- 
ational approach of |18j . and define the conditional min- 
entropy for an arbitrary theory with classical Y as 



H 0D (Y|.E) :=-logP guoss (r|£;) 



(A4) 



For the case of quantum systems, the conditional min- 
entropy was first introduced by Renner jTH] as a way 
to measure randomness conditioned on an adversary's 
knowledge. The min-entropy can also be defined when 
Y is quantum itself [19] . but we will not need it here. In 
the quantum setting, we will also use a smoothed version 
of the quantum conditional min-entropy, defined for any 
e > as: 



Ko(Y\E) 



. max H 0O (Y|.E) / j YB , 

Pye£d c {pye) 



(A5) 



where the maximization is taken over all (subnormalized) 
states Pye within e trace distance of pye- A similar 
definition could be made for arbitrary theories using the 
distance defined in [18] . but we will not require it here. 

The conditional min-entropy has a number of appeal- 
ing properties, which for any NC-HV model essentially 
follow from its operational interpretation, and also hold 
in the quantum setting |19) . First of all consider the min- 
cntropy of classical Y Z conditioned on side information 
E. Clearly, since guessing Y and Z can only be more 
difficult then guessing Y alone, we have P guess (YZ\E) < 
Pguess(Y\E). Translated, this gives monotonicity of the 
min-entropy 



H^YZIE) > HooiYlE) 



(A6) 



Similarly, to guess Y and Z from E one strategy would 
be to guess Z (in the worst case, choosing Z — z with 
z£Z taken from the uniform distribution) and then try 
to guess Y knowing Z. In terms of guessing probabili- 
ties, this means that P gucss (YZ\E) > P gucss (Y\EZ)/\Z\. 
Translated, we obtain the chain rule 

H oc (F|PZ) > R^iYZlE) - log \Z\ . (A7) 



A final property that will be important to us is that, 
as a direct consequence of ( A4 1 we may also write the 
min-entropy as 



U 00 {Y\E)=mvaTL 00 [Y\M{E)) 

M 



(A8) 



where the minimization is taken over all measurements 
Ad, and H 00 (Y\Ad(E)) is the min-entropy conditioned on 
the classical information obtained by measuring E with 
Ad. 



Appendix B: A splitting inequality valid in any 
NC-HV model 



Before turning to the proof of the generalized splitting 
inequality, let us briefly review what is meant by a non- 
contextual model. In any physical theory, we can imagine 
that a system is prepared according to some 'preparation 
V, on which we later make measurements Ad. Each such 
measurement can be viewed as a collection of elemen- 
tary effects f . The exact form of the effects depends on 
the model one considers. For example, in quantum the- 
ory the effects are simply given by POVM elements. A 
particularly useful effect is given by the so-called unit ef- 
fect I, corresponding to the identity in the quantum or 
classical setting. Hence to any effect one can associate a 
two-outcome measurement Adf = {f,I — f}. 

When discussing non-contextuality, this measurement 
is typically interpreted as a question one might pose to 
the underlying physical system and has two answers, 
"yes" for f and "no" for I — f . We hence also refer to 
f as a question. Of course, one might consider mea- 
surements that ask many questions simultaneously, that 
is, they consist of many individual effects. Two effects 
are called compatible if the corresponding questions can 
be answered simultaneously without causing disturbance 
to the underlying physical system, in the sense that we 
would obtain the same answers again were we to ask the 
same questions repeatedly. 

A set of mutually compatible effects/questions is 
thereby called a context. For example, if fi is compatible 
with f2 the set C\ = {fi,f2} is called a context. Similarly, 
if fi is compatible with f 3 , then the set Ci = {f!,f 3 } is 
also a context. Note, however, that in such a scenario it 
can still be that f2 and f 3 are not compatible. That is, 
any effect can be part of multiple distinct contexts. 

For each effect in a particular context, one can pose 
the question f by making the measurement Adf defined 
above. Informally, a model is called non- contextual if the 
answer to question will always be the same in both 
contexts, whether Adf 2 or Adf 3 are performed simultane- 
ously (which is possible by definition of being compati- 
ble) . In our example this means that if were we to make 
measurement Ad^ in context C\, or context Ci, we would 
always obtain the same distribution on outcomes. 
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1. Classical theory 

Recall that the phenomenon of min-entropy splitting 
guarantees that, if a string YqYi has high min-entropy 
then there is a way to split it by introducing a binary 
random variable C such that the string Yc has about half 
as much min-entropy as Y Yi. Classically, min-entropy 
splitting follows from the following statement. 

Lemma B.l ([J], Lemma 4.2). Let e > and Yq, Y± two 

random variables such that Hf yD (YoYi\E) > a, where E 
is classical. Then, there exists a binary random variable 
C such that R^YcClE) > a/2. 



Using the chain rule ( A7 ) and the monotonicity ( A6 1 of 



the min-entropy one immediately obtains the statement 
of min-entropy splitting 

R^PiYclEC) > a/2-1- log 1/e' (Bl) 



2. Non-contextual hidden variable models 

Typically, in a non-contextual hidden variable model it 
is assumed that a preparation V is simply a distribution 
over hidden variables A, and a measurement then cor- 
responds to "reading out" such hidden variables. Each 
outcome event k € K is associated with a correspond- 
ing effect ffe, where intuitively ffc. "reads out" the hidden 
variables by mapping a certain subset of possible hidden 
variables to the outcome k £ K. In contrast, some works 
consider more generalized scenarios known as ontological 
models [1 . The main difference here is that these hid- 
den variable models can locally model even contextual 
theories, but specify explicit conditions to make these 
generalized theories non-contextual again. 

In this section we show that the splitting inequality 
holds in any standard deterministic NC-HV, which is the 
definition taken in most previous work, as in e.g. [H[S]. 
We will, however, phrase our result in the general lan- 
guage of non-contextual models as introduced in [1] , re- 
stricting our attention to those models which are deter- 
ministic. 



a. Background 

Very intuitively, a non-contextual ontological model for 
an operational theory associates intrinsic attributes to 
every physical system, which are supposed to exist inde- 
pendently of the particular context in which the system 
might be observed. These attributes are described by a 
set of hidden variables A G A. Hence for us a hidden 
variable model consists of the following: 

1. A set of hidden variables A. 

2. For every preparation V in the physical theory, a 
probability distribution p(X\V) over A G A. 



3. For every ^-outcome measurement M., and hidden 
variable A G A, a probability distribution p(k\ A, M) 
over k £ [t] := 

The model is indeed a model for the physical theory if 
it accurately predicts the outcome distribution of any 
measurement on any preparation, i.e. performing mea- 
surement M. on preparation V produces outcome k with 
probability 

p(k\V,M) = ^2p(k\X,M) P (X\V) , (B2) 



where for notational simplicity we assume that A is dis- 
crete. 

Effects. We adopt the common notion that measure- 
ments are a collection of elementary effects. Here, an 
effect is a linear functional L- : A — > [0,1], mapping 
hidden variables to outcomes. As is common in the 
study of non-contextuality [5] , we will consider only mea- 
surements which are a collection of deterministic effects 
ft : A — >• {0,1}. That is, we effectively work with a 
deterministic model. Much more general scenarios are 
certainly possible pQ but we will not consider them here. 
Note that a deterministic model does not mean that there 
is no more randomness: preparations are given as prob- 
ability distributions over hidden variables and hence we 
generally do observe non-deterministic outcomes when 
measuring a preparation. Of particular importance is 
the unit effect I (i.e, the identity), which obeys 1(A) = I 
for all A G A. A measurement is thus a collection 
M. := {ffc | J2k ffc = I}) where we usually index the 
effects by the outcome that they give in M.. We write 
the probability of obtaining the outcome k using mea- 
surement M. containing the effect ffc as 



p(*|A,f fc ) :=p(k\X,M) = i k (X) 



(B3) 



Note that with every effect, we can again associate a two- 
outcome measurement Mf = {f, I — f } where without 
loss of generality we label f using the outcome T' and 
I — f using the outcome '0'. When concerned with such a 
measurement VWf we thus also use A, f ) and p(0|A,I— 
f) to denote the probabilities of obtaining outcomes 'I ' 
and '0' respectively. 

Extensions. Often we wish to relate one physical sys- 
tem to another. For example, we may wish to perform 
an additional independent experiment such as flipping a 
coin. Given a system with a set of hidden variables A, 
we allow its extension to a second system in the follow- 
ing way: if A' is another set of hidden variables used to 
describe another physical system, then the combined sys- 
tem will have hidden variables A x A'. For every prepa- 
ration V on the original system, we say that V 1 is an 
extension of V in the combined system if for every A G A 



p{X\V)= J2 P'((A,A')|P') 



(B4) 



A'eA' 



A measurement A4' is similarly said to extend M. as long 



Kj|A,M) = 5>'(j|(A,A'),M') 



Preparations. To study our problem, we will assume 
that there is an implicit prior distribution on prepara- 
tions V describing prior knowledge about the state of the 
system under consideration. More specifically we will be 
concerned with encodings of a string y into preparations 
V y) where the probability Py(v) of choosing the string y 
translates into a prior probability on the preparation as 



p(V y ) :=P Y (y) 



b. Splitting inequality 



(B6) 



We are now ready to generalize Lemma |B.1| to any 
deterministic NC-HV model. The analogue of (Bf ) is 
then an easy corollary. Note that in this statement, 
the conditional min-entropy is understood as being de- 
fined through the guessing probability ( Al I as in equa- 



tion (A4). This assumes given a fixed distribution PyqYx 



on the strings yoVi, through which a prior distribution 
on the preparations V VoVl follows as explained at the end 
of Section lB2l 



Theorem B.2. Let a NC-HV model 9JI be given, with 
corresponding set of hidden variables A. Let Y = YqY\ 
be two classical random variables each taking values in a 
finite set y , and {Py a yi}(y a . yi )£y 2 a corresponding fixed 
set of preparations on a register E such that 



R oo (Y Y 1 \E) > a 



(B7) 



Then there exists an extended model 9JI' over the set of 
hidden variables A' = A x {Oc, lc}; an d a set of prepara- 
tions V' yoyiC , for c £ {0, 1}, extending the V yoyi and such 
that 



a 



(B8) 



Proof. Recall that we assume a prior distribution on the 
preparations given by p(V yoyi ) := PY Y 1 (yoVi)- This lets 
us define the guessing probability, which by assumption 
is such that 



> P r 



(B9) 



To rewrite the r.h.s. in terms of hidden variables, first 
of all note that given the prior distribution over prepara- 
tions we can write the probability of a particular hidden 
variable A e A as 



2/0 2/1 i 



(BIO) 



Fix a measurement A4 = {fk}k, where we indexed the 
effects by their outcome in the measurement. By defi- 
nition, the probability of observing the outcome k when 



(B5) M is performed on the preparation P yoyi is 



p(k\V yom ,M) = 5>(Amo*i)f>WA,f fe ) , (Bll) 

A 

The overall probability of observing the outcome k when 
M is performed on the preparation V corresponding to 
the mixture of the preparations V yoVl with associated 
probabilities p{V yoyi ) is then 

p(k) :=p(k\V) = X>(?W)p(fc|7W) . (B12) 



Note that by definition the hidden variables A give deter- 
ministic outcomes under the measurement of any effect, 
and hence p(k\ A, f&) = Af fe , where Af fc is f if the measure- 
ment {ffe,I — ffe} deterministically produces the outcome 
'ffe' when performed on a system in state A, and other- 
wise. Using Bayes' rule twice we obtain 

p(k)p(V yoyi | k) = p(V yoyi )p(k\V yoyi ) 

= p(Vy nyi )J2p( X \'PyoyMk\\fk) 

A 

= ^2p(X)p{k\X,f k )p(V Voyi \X) . (B13) 



Using ( A3 1 and ( A8 1 we obtain that for any measurement 
M = {ffe}, the guessing probability of YqY\ is determined 
by the maximum posterior probability of any string yoj/ij 
conditioned on obtaining the outcome k when measuring 
V with M , so that (|B9|) implies 



> 



k 



p(k) maxp(V v 

VoVi 



y p(\) max 

^ VoVi 



5>(fc|A,fk)p(7W|A) 



(B14) 



where in order to invert the summations over k and A 
with the maximization we used the fact that for any k, 
there exists exactly one A such that Af fc = 1 and vice- 
versa, so that the summation over A (resp. over k) which 
is after the max in the expressions above contains exactly 
one term. This is a consequence of the fact that the ffc 
form a measurement, so that J2k ^ = •"•) together with 
the variables A being deterministic, so that p(k\X, ffc) can 
only be either or 1. 

We now need to define the additional single-bit ran- 
dom variable C, which is intuitively supposed to des- 
ignate which of the two halves, yo or y\, the prepa- 
ration V yoyi contains the least amount of information 
about, so that we can indeed lower-bound the min- 
entropy H 0o (YcC\V). For this we allow C to be cor- 
related with the preparation V yoVl . 

In order to accommodate C, we extend the set of 
hidden variables A' as A' x {Oc,lc}- Define q± = 
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J^xPiM'Pyoyi)' where the sum ranges over all A such that 

E yo PCPyoyi\ X ) > 2 ~ a/2 , and 9o = 1 - ?i. Note that q 
can be computed by the same summation, but now rang- 
ing over all A such that p(P yoyi \X) < 2~ a / 2 . Define 
two preparations as follows: 



• V. 



yoyii 



is defined through the distribution 



(B15) 



P{{*Ac)\V yoyi i) = 

p(x\v yoyi )/ qi if EyoPPvoviW > 2 ~ Q/2 

otherwise 



and p((A, 0c)\'Py o y 1 i) = for every A. 
• T^yoyiO is defined analogously by 

P((\0c)\r voyi o) = (B16) 
p(M-P yoyi )/qo if EnPPvomW < 2 ~ a/2 




■lyo - 
otherwise 



and p((A, lc)[Py yio) = for every A. 

Finally, we define the preparation V yoVl c as the mix- 
ture of V yoyi i with probability q\, and of V yoyi o with 
probability go- Note that the preparation V yoVl c is 
indeed an extension of V yoVl in the new theory, as 

p((\0c)\-P yoyiC ) + p((\lc)\-P yoyiC ) =pWP Voyi ). Fi- 
nally, we update the prior on preparations by setting 

P(Py vil) = QiP(Py vi) and pfPvavio) = qoP{V yoyi ), so 
that 



P(Py yi) = P(Py yio) + p(P yoyi i) . 



(B17) 



One can check that with these definitions, whenever 
^2 V pVvovi |A) — 2 _Q / 2 we have, using Bayes' rule twice, 



p{V yoyil \{\l c )) 

= P((K 1 c)\Vy y 1 l)p(Vy y 1 l) 

K(A,lc)) 
_ (.P(M'Pyoyi) I '<h) ■ (iM^yoyi)) 



pW 



P(Pyoyi\ X ) 



(B18) 



and otherwise, where for the second equality we 
used p(X) = p((X,lc)) for all those A such that 
p((X,lc)\'P Voyi i) is not zero. 

From this point on, our proof follows very closely the 



classical proof of Lemma B.l By definition, for every y\ 
and every A, we have that 



E^o ai ol(A,Oc))<2-« 



/2 



(B19) 



It does not seem possible to similarly bound 
X) Vl KPvoi/iilCMc))) but it is not necessary ei- 
ther, as we do not have access to this quantity directly. 



Rather, let Ai = {fk}k be any 2d-outcome measurement; 



as in (B14| we need to bound 



max 

yo 



$>(*|(A, l c ),f fc ) $>(7WI(A, lc)) 



L k 



Hi 



Note that by definition, J2 yo pCPy yiA(^> ^c)) is either 
or at least 2~ Q / 2 , so that for all yo, D\ an d A, we have 
the trivial bound 

< maxp(^ Dl/ll |(A, l c )) 2 a / 2 ^p(? 909ll |(A, l c )) 
yayi * — ' 

yo 

= maxp(V yoyi | A) 2 a / 2 ^p(P 5o9ll |(A, l c )) (B20) 
yoyi * — ' 
yo 



where for the last equality we used (B18). Summing this 



equation o ver a ll yi and combining it with (B14| lets 
us bound ( |B20]) by 2~ Q 2 Q / 2 • 1 = 2~ a / 2 . This bound 
together with (B19l proves the theorem. □ 



The fact that min-entropy splitting holds in any NC- 
HV model now follows as a corollary from Theorem |B. 2 1 
and the fact that the chain rule (A7) and monotonic- 



ity ( A6) of the min-entropy also hold for NC-HV models. 



Corollary B.3. Let a NC-HV model 9J? be given, with 
corresponding set of hidden variables A. Let Y = YqYi 
be two classical random variables each taking values in a 
finite set y , and {'P ya y 1 }( ya .y 1 ) e y2 a corresponding fixed 
set of preparations on a register E . Then there exists 
an extended model 9JI' over the set of hidden variables 
A' = A x {Oc, lc}; and a set of preparations V yoyiC , for 
c € {0, 1}, extending the V yoyi such that 



Hoo(Y c \EC) > 



(Foil |£) 



1 



(B21) 



To see that this equality is robust is now again an im- 
mediate consequence of the chain rule (A7) and mono- 



tonicity property ( |A6[ ), which tell us that when we obtain 
some additional classical information A = a with aei 
we have 



B. X {Y C \EAC) > RooiYcAlEC) - log |^| 
>H 00 (Fc|^C)-lo g m 
>H oo (r li|£)/2-log|.A| 



1 



(B22) 
(B23) 
(B24) 



That is, a secretly helpful NC-HV leaking a small num- 
ber m = log \A\ bits of additional information does not 
decrease the min-entropy by more than log \A\ bits. 



Appendix C: Splitting is violated by quantum 
mechanics 



We are now ready to show that the splitting inequal- 
ity ( B8 ) is violated by quantum mechanics in a very 
strong sense. To this end, we first construct a partic- 



ular quantum encoding of two dits into one qudit. 
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by 



1. The encoding 



Consider the encoding E : {0, . . . , d— 1} 



(2 , trd 



given 



E(y ,yi) = \% oV1 ) i=X$>Z?\9) , (CI) 



where Xd and Z d are the generalized Pauli matrices 
given by their actions on an orthonormal basis {|j/o)> 2/o S 
{0,...,d-l}} 



Xd\yo) = \yo + 1 mod d) 
Z d \y )=^\y ) , 



(C2) 
(C3) 



with u — exp(27ri/d), and 
I*) 1 



(|0)+F|0)), (C4) 



with F denoting the Quantum Fourier transform opera- 
tor over Zd- Note that Xd = FZdF^. We also refer to 
the eigenbasis of Zd as the computational basis and the 
eigenbasis of Xd as the Fourier basis. Below, it will be 
convenient to note that Z d acts as the cyclic shift opera- 
tor in the eigenbasis of Xd, and vice versa. Throughout, 
we will assume that d is prime. 

Imagine a source that chooses yo, yi G y := {0, . . . , d— 
1} uniformly at random and emits \^y oVl ), corresponding 
to the ccq-state 



1 



d? ^ 

J/0, J/1 



\Vo){Vo\ ® lift) 0/1 1 *» I^W )(*V0Vi I 
Yo Yi B 



(C5) 



Throughout, we will consider the probability that we 
guess YqY\ or the individual entries Y) and Y\ given the 
register E. We begin by showing that for our specific 
encoding the probability of guessing both entries Yq, Yi is 
small. 



Lemma C.l. For the ccq-state py y% e given by (C5) 

1 



Pguess(YoYi\E) — 



(C6) 



Proof. Computing the probability of guessing both bits 
is equivalent to solving the semidefinite program (SDP) 



maximize £ J2 Vom ^ (M/oyil^WX^oj/il) 
subject to M yoVl > for all g/o>2/i > 

Ey , yi M V0 , Vl =l. 

The dual SDP is easily found to be 

minimize Tr(Q) 

subject to Q > Jrl^vowX* 



Let ^primal and Udual be the optimal values of the primal 
and dual respectively. Note that by weak duality we have 

Wdual > "primal- Since l*»oWi)(*Vo»i I is a P Ure state ' Q = 

I/eP is a feasible dual solution with value tr(Q) = 1/d. 

We now show that Q is in fact optimal, by constructing 
a solution to the primal that achieves the same value. Let 
M voyi = \% yi)(% yx\/ d - Clearly, M VoVl > for all y 
and y\, and by Schur's lemma we have 

E M vov* = \ E x^zfm(nx y d °zfV m 

vo,yi vo,vi 

= I . (C8) 

Hence, our choice of operators is a feasible primal solution 
with primal value 1/d which concludes our claim. □ 

We now show that the probability of retrieving any 
of the individual entries Y and Yi is nevertheless quite 
large. To this end, let us first establish the following 
simple lemma. 



Lemma C.2. For the encoding defined in (CI) we have 
for all yo,2/i e {0, 



,d-l} 
l(2/o|1W>| 2 

\(yi\^\y yoyi )\ 2 



i 



2Vd ' 
1 



2Vd 



Proof. First of all, note that for all yo and y\ 



(0\(X?)*\*y 

(0\Zf\V) = 



(C9) 
(C10) 



(Cll) 
(C12) 



where we have used the fact that Z d \0) = |0). Similarly, 
we have 



(2/i 1^1*, 



-Vi 



a ( yi \F^zy/\*) 



(C13) 



= u-^ a {Q\F\ZfyzY\*) (C14) 
= uj- yia (0\F^\^) 
I*) , 



— ,.,-yi a / 



(C15) 
(C16) 



with u) = exp(2ni / d) , where the first equality follows 
from the fact that F\yi) is an eigenvector of X d , and the 
last equality by noting that F\^>) = |\&). It thus remains 
to compute 



(0|*) 



1 




from which our claim follows. 



«0|0) + <0|F|0)) (C17) 



(C18) 



□ 



It is now straightforward to compute the maximum 
probabilities that we retrieve Yq and Y± correctly. 
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Lemma C.3. For the cq-states py e and Py^e given by 



the reduced states of ( C5 ) we have 



-Pguess(Xo|-E') — Pgucss(Yl\E) 



1 

2\/d 



(C19) 



Proof. We first show our claim for P guess (y 1 -E)- Consider 
the state corresponding to an encoding of yo given by 

a va ■= lT, Vl l*wo»i><*tfo Vl l- As before, we can express 
the winning probability as an SDP with primal 

maximize \ J2 yo tr ( M yo °y ) 
subject to M VQ > for all yo , 
E yo M yo =l. ' 

We now show that without loss of generality, the optimal 
measurement has an extremely simple form. First of all 
note that \a yo , Z3] = for all a and yo since 



2. Min-entropy splitting 

We are ready to show that the min-entropy splitting 
inequality (B8 ) is violated for the ccq-state given in ( C5 1. 



Theorem C.4. For the ccq-state given in ( |C5[ ) ; we have 
that for any ccqc state py y 1 ec with dim(C) = 2 satisfy- 
ing tv c {pY Y l Ec) = PYq Yi e j 



P g ues S {Y c \EC = C) > 



1 



1 



2Vd 



(C25) 



for all c e {0, 1}. 

Proof. Note that we may express 

PY Y lEC = 4 E \yo}(Vo\ ® \Vl){Vl\ ® Py C Vl c • (C26) 



Vx 



(C20) 



Hence, if {M yo } yo is an optimal solution then so is the 
measurement given by M yo = ^ o 2JM S0 (ZJ)l Thus 
without loss of generality we may assume that the opti- 
mal measurement operators are diagonal in the computa- 
tional basis. Now consider the largest term corresponding 



to M max and <7 max such that 



tr 



max ) > tr (M ; 



vo u yo 



(C21) 



for all yo- Since all measurement operators are Hermi- 
tian, we can expand M ma x = J2j m hs eigen- 

basis. We may now in turn consider the element \j)(j\ 
which has the largest overlap with a max . That is, choose 



argmax-(j|<7 max |j) 



that is, (m|cr max |m) > (j\a max \j) for all j. Clearly, we 
have that 



P E uess(Yo\E) < (mlcTn 



(C23) 



It remains to prove that this inequality is tight. With- 
out loss of generality assume that cr max = CToi an y other 
case will follow by a simple relabeling. Note that by 
Lemma IC. 2 1 we have 



(yoko|j/o) < (0ko|0) = 



1 



1 



2Vd 



(C24) 



for all yo and thus we choose m = in (C22|. Note 



that by construction we have a yo — Xfa^X^'Y, and 
hence {yo\o ya |yo) = (0|ctq jO). Thus for the measurement 
in the computational basis given by M yo = \ya)(yo\, the 
inequality ( C23 ) is tight which together with ( |C24 ) gives 
our claim. The case of retrieving Y\ is exactly analogous, 
with the roles of Xd and Zd interchanged. □ 



We now first note that by the reduced trace condition and 
the fact that Yq and Y\ are classical we must have that 

trc(pg C yiC ) = \^y y 1 ){%oyi 
pure state, this implies that p 



Since |*„ oyi )(* 

yoyi I ls a 

r.EC 

yoyic 



yoyi ' 



Since C is classical which we can express a. 



c 

yoyi 



without loss of generality in the computational basis as 



yoyi 



lyoyi 



+ (1- w)|i)<i| 



(C27) 



for some arbitrary distribution {q yoyi , 1 — Qyoyi}- 

Let us now consider how well we can compute 
Pguess(Yo\EC — 0); the case of C = 1 is analogous. First 
of all, note that the state obtained from pyqYxEC after 
we measured C in the computational basis and obtained 
outcome C = 0, followed by tracing out C is given by 



PY0Y1E 

1 



(C22) = _L q yo q yi \ yo \yo)(yo\ ® \Vi){Vi\ ® \% 0V1 )(% 



vo,yi 



(C28) 



where q = J2 Voyi QyoVi and q yoyi = (l/d 2 )q yo q yilyo . The 
states we wish to distinguish are thus given by 



Cy |c=0 



1 



T, yi %i\ 



-E 

y yi 



1yi\yoPyoyi 1 



(C29) 



Note that from Lemma C.2 we have that for all yo 



/ IN 11 

(yoK | c=0 |2/o) = 2 + 



(C30) 



Hence, for the measurement in the computational basis 
we succeed with probability at least 1/2 + l/(2\/d), in- 
dependent of the distributions {q yoyi , 1 — % yi}- Again, 
by exchanging the roles of Xd and Zd the same probabil- 
ity can be achieved using a measurement in the Fourier 
basis, which proves the theorem. □ 
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In terms of min-entropy, we thus have that and 
B^iYoY^E) = log d but for all C we have Hoc (Y C \EC) w 
l! This effect is still observed for the e-smooth min- 
entropy for small e, since - log {P guess {Y c \EC) - e) > W^Y^EC) . (C32) 

RUYoY^E) > R^YoY^E) = logd , (C31) 



